[FA4] Update flash-attention to latest upstream FA4 by LucasWilkinson · Pull Request #38690 · vllm-project/vllm

LucasWilkinson · 2026-04-01T05:19:54Z

Testing PR for updating FA4 to latest upstream

Point vllm_flash_attn.cmake to updated FA branch (95e93d2) which syncs flash_attn/cute/ with upstream Dao-AILab/flash-attention. Bump nvidia-cutlass-dsl>=4.4.2 and quack-kernels>=0.3.3 to match upstream FA4 requirements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist

Code Review

This pull request updates the vllm-flash-attn Git tag to a newer commit and bumps the minimum versions for nvidia-cutlass-dsl and quack-kernels in the CUDA requirements file. I have no feedback to provide.

MatthewBonanni · 2026-04-01T16:10:12Z

This will fix #36763 thanks to the inclusion of Dao-AILab/flash-attention@0293155

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

MatthewBonanni

LGTM

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Song Kai <songkai05@baidu.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: jackcfwang <jackcfwang@tencent.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

- vLLM: v0.19.0 -> v0.19.1rc0 - FlashInfer: v0.6.6 -> v0.6.7 v0.19.1rc0 is the first release with the FA4 NaN fix (PR #38690) for Blackwell/SM100 and TRTLLM as default MLA prefill backend. Eliminates the need for Dockerfile-level FA4 patching. Refs: vllm-project/vllm#38690, INF-353

mergify bot added ci/build nvidia labels Apr 1, 2026

github-project-automation bot added this to NVIDIA Apr 1, 2026

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 1, 2026

This was referenced Apr 1, 2026

[Bug]: Kimi-K2.5 outputs only '!!!!!!!!!!' in reasoning field, content is always null #36763

Closed

Update FA4 vllm-project/flash-attention#128

Merged

update to point to main

874d744

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

MatthewBonanni approved these changes Apr 2, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Apr 2, 2026

LucasWilkinson changed the title ~~[WIP][Do not merge yet] Update flash-attention to latest upstream FA4~~ [FA4] Update flash-attention to latest upstream FA4 Apr 2, 2026

LucasWilkinson enabled auto-merge (squash) April 2, 2026 14:37

MatthewBonanni mentioned this pull request Apr 2, 2026

[Attention][MLA] Re-enable FA4 as default MLA prefill backend #38819

Merged

LucasWilkinson merged commit cb3935a into vllm-project:main Apr 2, 2026
139 of 140 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Apr 2, 2026

mieshkiwrk pushed a commit to mieshkiwrk/vllm that referenced this pull request Apr 2, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

8e012d1

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 3, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

5761876

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Apr 6, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

284958f

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

askliar pushed a commit to netanel-haber/vllm that referenced this pull request Apr 7, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

83af702

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

askliar pushed a commit to netanel-haber/vllm that referenced this pull request Apr 7, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

54b2322

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

askliar pushed a commit to netanel-haber/vllm that referenced this pull request Apr 7, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

e24c22e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

USTCKAY pushed a commit to USTCKAY/vllm that referenced this pull request Apr 7, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

b684c41

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Song Kai <songkai05@baidu.com>

puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

e45d535

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

fa2a24a

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

8c47434

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

jackcfwang pushed a commit to jackcfwang/vllm that referenced this pull request Apr 10, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

d2e77b1

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: jackcfwang <jackcfwang@tencent.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 10, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

093495d

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FA4] Update flash-attention to latest upstream FA4#38690

[FA4] Update flash-attention to latest upstream FA4#38690
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/update-fa4

LucasWilkinson commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

MatthewBonanni commented Apr 1, 2026

Uh oh!

MatthewBonanni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

LucasWilkinson commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

MatthewBonanni commented Apr 1, 2026

Uh oh!

MatthewBonanni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants